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University  of  Pittsburgh 
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Abstract 

Gips  is  a  problem-solving  system  that  muuus  the  strat¬ 
egy  shifts  of  children  learning  to  add.  The  system  uses 
a  generalized  form  of  means-ends  analysis  as  its  rea¬ 
soning  algorithm,  and  it  learns  probabilistic  selection 
and  execution  concepts  for  its  operators.  With  this 
combination,  Gips  models  the  “SUM-to-MIN”  transi¬ 
tion  that  children  exhibit  when  learning  to  add  (Siegler 
Sc  Jenkins,  1989).  The  system  generates  the  appropri¬ 
ate  final  strategy,  as  well  as  the  intermediate  strategies 
that  Siegler  and  Jenkins  observed. 

INTRODUCTION 

Siegler  and  Jenkins  (1989)  have  identified  a  number  of 
distinct  strategies  that  children  exhibit  when  learning 
to  add  two  numbers  on  their  hands.  This  paper  re¬ 
ports  a  model  of  the  acquisition  of  these  strategies  in 
a  computational  problem-solving  system.  This  model 
provides  a  testable  theory  of  the  cognitive  mechanisms 
involved  in  learning  to  add.  In  addition,  it  has  helped 
us  identify  some  of  the  types  of  learning  events  and 
mechanisms  that  may  be  involved  in  general  strategy 
acquisition. 

The  paper  begins  with  a  description  of  a  computa¬ 
tional  problem  solver  called  GlPS  (General  Inductive 
Problem  Solver),  which  uses  a  generalized  form  of  means- 
ends-analysis  (MEA)  (Jones,  1989).  It’s  learning  mech¬ 
anism  is  based  on  SLhlimmer’s  (1987;  Schlimmer  k 
Granger,  1986a,  1986b)  Stagger  system,  which  uses 
probabilistic  induction  learn  concepts  from  examples. 
The  basic  version  of  GlPS  uses  this  algorithm  to  learn 
search-control  knowledge  (i.e.,  knowledge  about  when 
operators  should  be  selected ),  similarly  to  other  problem¬ 
solving  systems  that  learn  (e.g..  Sage,  Langley,  1985; 
Soar,  Laird,  Rosenbloom,  Jc  Newell,  1986;  Prodigy, 
Minton,  1988/1989).  However,  the  enhanced  version 
of  GlPS  also  adjusts  its  representation  of  when  opera¬ 
tors  can  be  executed.  We  believe  that  this  ability  is  key 
to  some  of  the  strategy  changes  that  people  exhibit. 

After  describing  the  system,  we  present  GlPS’  account 
for  the  sequence  of  of  addition  strategies  that  Siegler 
and  Jenkins  found  in  children.  GlPs  successfully  mod¬ 
els  the  strategy  shifts  through  a  combination  of  it’s 
general  learning  algorithm  and  simple  changes  to  its 
operator  representations.  Toward  the  end  of  the  pa¬ 
per,  we  provide  a  discussion  of  our  results. 


THE  GENERAL  INDUCTIVE 
PRG3LEM  SOLVER 

GlPS  can  be  classified  as  a  problem-solving  system 
that  learns  from  its  experiences.  However,  it  is  no¬ 
tably  different  from  other  problem  solvers  in  a  num¬ 
ber  of  ways.  Primary  among  these  are  the  use  of  a 
generalized  form  of  means-ends  analysis  as  the  main 
planning  algorithm,  and  a  learning  mechanism  that 
is  based  on  probabilistic  reinforcement  rather  than  a 
symbolic,  analytical  approach.  In  this  section  we  de¬ 
scribe  the  details  of  Gips’  planning  and  learning  algo¬ 
rithms,  together  with  its  representation  of  operators 
and  abstract  problem  descriptions. 

THE  PLANNING  ALGORITHM 

In  our  research  we  have  developed  two  versions  of 
GlPS.  To  simplify  the  discussion,  we  will  first  describe 
the  basic  version  of  the  system  in  detail.  Later,  we 
will  discuss  some  of  the  additions  we  have  made  to 
develop  a  more  complete  model  of  strategy  acquisi¬ 
tion.  The  planning  algorithm  consists  of  the  two  func¬ 
tions  Transform  and  Apply.  These  functions  be¬ 
have  similarly  to  systems  that  uses  means-ends  anal¬ 
ysis.  Transform  attempts  to  satisfy  what  we  call  a 
Transform  goal  to  change  the  curreni  state  into  a 
state  that  satisfies  some  goal  conditions.  The  current 
state  and  goal  conditions  are  both  represented  as  a  set 
of  relations  over  objects,  where  some  of  the  objects 
may  be  variables.  The  function  satisfies  goals  by  first 
ApPLYing  an  operator,  and  then  recursively  Trans- 
FORMing  the  resulting  state.  To  Apply  an  operator 
to  the  current  state,  the  system  must  first  Trans¬ 
form  the  current  state  into  a  state  that  satisfies  the 
preconditions  of  the  operator,  and  then  Execute  the 
operator.  We  do  not  have  the  space  to  discuss  the 
overall  planning  scheme  in  detail,  but  we  will  focus  on 
the  aspects  that  distinguish  Gips  from  other  systems. 

SELECTION  OF  OPERATORS  FROM 
MEMORY 

As  we  have  stated,  this  approach  to  problem  solv¬ 
ing  is  similar  to  a  form  of  means-ends  analysis  (Ernst 
k  Newell,  19C9;  Newell  k  Simon,  1972).  However, 
GlPS  uses  a  generalization  of  the  standard  approach, 
which  is  borrowed  from  the  Eureka  system  (Jones, 


1989).  Rather  than  always  selecting  operators  whose 
actions  mention  the  current  goals,  any  operator  in 
memory  can  theoretically  be  selected  at  any  time.  To 
determine  which  operators  will  actually  be  selected, 
GlPS  borrows  a  probabilistic  approach  used  by  the 
Prospector  expert  system  (Duda,  Gaschnig,  &  Hart, 
1979)  and  Schlimmer’s  (1987;  Schlimmer  &  Granger, 
1986a,  1986b)  Stagger  system,  which  learns  concept 
descriptions  from  examples.  In  the  standard  language 
for  learning  from  examples,  we  say  that  each  operator 
has  associated  with  it  a  concept  that  predicts  when 
it  would  be  useful  to  select  that  operator.  For  the 
remainder  of  this  paper,  we  will  refer  to  this  as  the 
selection  concept  for  an  operator. 

A  Transform  goal  encountered  during  problem  solv¬ 
ing  can  be  classified  as  either  a  positive  or  negative  in¬ 
stance  of  a  selection  concept.  Concepts  and  instances 
are  represented  as  a  set  of  literals  (i.e.,  the  relations 
that  appear  in  Transform  goals),  wb'ch  are  matched 
in  an  attempt  to  classify  the  instances  among  the  con¬ 
cepts.  Finally,  each  literal  in  a  concept  has  associated 
with  it  two  values:  sufficiency  represents  how  much 
the  presence  of  the  literal  predicts  a  positive  instance 
of  the  concept,  and  necessity  represents  how  much  the 
absence  of  a  literal  predicts  a  negative  instance  of  the 
concept. 

When  a  new  Transform  goal  is  encountered,  the  lit¬ 
erals  of  that  goal  are  matched  against  the  literals  of 
each  operator’s  selection  concept  in  order  to  determine 
which  of  the  concept’s  literals  are  present  and  which 
are  absent.  Once  matching  has  occurred,  the  system 
calculates  a  prediction  value  that  represents  the  odds 
that  the  current  instance  is  a  positive  example  of  the 
concept.  The  formula  used  for  prediction  is 

Odds(I  H  C)  =  Odds{C)  run  n>> 

/,~r  i,+i 

where  /  HC  means  that  instance  /  is  a  positive  ex¬ 
ample  of  concept  C,  and  fj  •—»  /  means  that  literal 
fj  of  the  concept  matches  a  literal  in  /.  Thus,  the  fi¬ 
nal  prediction  score  represents  the  odds  that  instance 
/  is  a  positive  example  of  concept  C,  and  it  consists 
of  the  product  of  the  prior  odds  for  the  concept,  the 
sufficiency  scores  of  all  the  concept  literals  that  are 
matched  by  the  instance,  and  the  necessity  scores  of 
all  the  literals  that  are  not  matched  by  the  instance. 
If  the  odds  are  greater  than  one,  it  means  that  the  in¬ 
stance  is  likely  a  positive  example  of  the  concept.  In 
other  words,  given  the  current  Transform  goal,  it  is 
useful  to  attempt  to  APPLY  the  operator  associated 
with  this  concept. 

In  Stagger,  matching  is  a  trivial  task  because  it  uses 
a  propositional  representation.  However,  GlPS  allows 
predicates,  making  matching  more  difficult.  Each  lit¬ 
eral  is  a  relation  with  a  number  of  arguments,  some 
of  which  may  be  variables.  In  addition,  a  relation 
can  appear  multiple  times  in  an  instance,  each  time 
witli  different  argument  values.  Thus,  there  are  gen¬ 
erally  multiple  possible  matches  between  an  instance 
and  a  concept.  Given  a  particular  instance  and  con¬ 
cept,  GlPS  finds  all  the  maximal  partial  matches  of  the 


literals  in  the  instance  to  the  literals  in  the  concept. 
This  means  that  there  will  be  one  prediction  score  for 
each  instantiation  of  the  instance  to  the  concept. 

LEARNING  IN  Gips 

In  the  basic  version  of  GlPS,  learning  involves  chang¬ 
ing  the  system’s  selection  behavior.  Gips  accomplishes 
this  by  first  assigning  credit  and  blame  to  the  opera¬ 
tors  it  has  attempted  to  Apply  from  each  Transform 
goal.  The  TRANSFORM  goal  is  classified  as  a  positive 
instance  of  the  selection  concept  for  any  operator  that 
led  to  a  solution  from  that  goal.  It  is  classified  as 
a  negative  instance  for  operators  that  branch  off  of 
the  solution  path.  The  algorithm  for  storing  examples 
in  Gips  is  based  on  the  algorithm  used  in  Stagger. 
We  have  generalized  Schlimmer’s  algorithm  to  handle 
a  relational  representation,  but  the  current  version  of 
GlPS  does  not  include  Stagger’s  constructive  induc¬ 
tion  techniques.  The  learning  algorithm  depends  on 
the  statistical  nature  of  the  sufficiency  and  necessity 
scores  for  each  literal.  These  scores  are  derived  from 
conditional  probabilities  using  the  following  formulas: 

<?  - 

3  Pifi^ni-tcy 

N  -  n/j  *A  I\I  H  C) 

'  p(fj^i\iicy 

The  learning  algorithm  updates  estimates  of  each  of 
these  conditional  probabilities  based  on  the  presence 
of  literals  in  the  new  instance  and  the  classification 
of  the  instance  as  positive  or  negative.  As  the  es¬ 
timates  change,  so  does  GlPS’  selection  behavior  on 
future  problems.  GlPS  also  augments  its  concept  de¬ 
scriptions  when  the  new  instance  contains  literals  that 
are  not  already  present  in  the  concept.  It  merely  adds 
those  literals  to  the  concept  description  and,  lacking 
any  other  knowledge  about  the  importance  of  the  lit¬ 
erals,  initializes  the  probability  estimates  for  those  lit¬ 
erals  to  represent  statistical  independence. 

ADAPTING  GlPS  FOR  STRATEGY 
ACQUISITION 

The  implementation  of  GlPS  that  we  have  described 
so  far  is  capable  of  learning  search-control  knowledge, 
indicating  when  and  in  what  order  operators  should  be 
selected  in  new  situations.  In  this  version  of  GlPS,  each 
operator  has  associated  with  it  a  probabilistic  selection 
concept  and  a  set  of  preconditions  that  specify  when 
the  operator  can  execute,  among  other  things. 

To  enhance  GlPS,  we  added  to  each  operator  a  second 
probabilistic  concept  description  that  represents  when 
the  system  thinks  the  operator  should  be  able  to  ex¬ 
ecute.  To  execute  an  operator,  the  system  no  longer 
checks  whether  the  preconditions  are  satisfied.  Rather, 
it  makes  a  probabilistic  prediction  based  on  the  liter¬ 
als  that  are  true  in  the  current  state,  and  it  attempts 


to  execute  an  operator  when  the  prediction  value  is 
greater  than  1. 

In  order  to  learn  these  new  execution  concepts,  the 
system  cannot  assign  credit  and  blame  itself  without 
feedback  from  the  outside  world.  Therefore,  every  time 
the  system  decides  that  an  operator  should  execute,  it 
asks  the  user  to  confirm  its  prediction.  If  the  predic¬ 
tion  is  true,  the  system  stores  a  positive  example  for 
this  operator’s  execution  concept  and  executes  the  op¬ 
erator.  If  the  prediction  is  false,  the  system  stores  a 
negative  example. 

In  addition  to  learning  probabilistic  execution  con¬ 
cepts  for  operators,  the  enhanced  version  of  GlPS  han¬ 
dles  two  types  of  learning  events  that  involve  the  pre¬ 
conditions  of  the  operators.  It  is  important  for  the 
system  to  be  able  to  change  the  preconditions,  because 
these  literals  are  set  up  as  subgoals  when  the  operator 
cannot  immediately  execute.  This  has  an  impact  on 
the  selection  of  other  operators  as  the  system  continues 
work  on  a  problem.  The  first  type  of  learning  event 
occurs  when  the  sufficiency  value  for  a  literal  in  the 
execution  concept  reaches  a  threshold.  At  this  point, 
that  literal  is  added  as  a  precondition  of  the  operator. 

The  second  type  of  learning  event  occurs  when  the  sys¬ 
tem  successfully  executes  an  operator,  but  not  all  of 
the  preconditions  are  satisfied.  In  this  case,  the  sys¬ 
tem  removes  the  offending  relations  from  the  precon¬ 
ditions.  This  indicates  that  the  system  has  found  the 
preconditions  to  be  an  incorrect  symbolic  description 
of  the  execution  concept.  It  is  interesting  to  note  that 
neither  of  these  learning  events  are  “impasse-driven,” 
but  they  allow  the  system  to  gradually  shift  its  rep¬ 
resentation  of  the  domain  it  works  in.  These  shifts 
manifest  themselves  as  strategy  changes  when  solving 
problems. 

REPRESENTATION  OF  THE 
ADDITION  DOMAIN 

The  last  system  details  concern  GlPS’  representation 
of  the  domain.  GtPS  describes  the  world  as  a  set  of  re¬ 
lations  between  objects.  In  the  addition  domain,  these 
objects  and  relations  include  the  numbers  that  are 
part  of  the  problem,  the  state  of  the  problem  solver’s 
“hands”  while  it  is  adding,  and  the  value  of  a  counter 
that  the  problem  solver  keeps  “in  its  head.”  The  sys¬ 
tem  also  has  a  set  of  operators  that  simulate  the  so¬ 
lution  of  addition  problems  by  novice  problem  solvers. 
Each  operator  includes  a  set  of  preconditions,  add  con¬ 
ditions,  delete  conditions,  and  possibly  a  set  of  con¬ 
straints  on  variable  bindings. 

GlPS  requires  sixteen  operators  to  represent  the  ad¬ 
dition  domain.  There  are  two  particular  operators, 
which  we  refer  to  as  the  End-Count  operators,  that 
are  involved  in  most  of  the  strategy  shifts.  For  future 
reference,  the  series  of  preconditions  that  the  Left- 
End-Count  operator  acquires  appears  in  Table  1.  In 
addition  to  supplying  the  system  with  the  operators, 
we  initialized  their  selection  concepts  so  the  system 


generates  the  elementary  adding  strategy.  We  accom¬ 
plished  this  by  setting  the  literals  of  each  operator’s  se¬ 
lection  concept  to  be  the  preconditions  and  the  goals 
that  the  operator  could  satisfy.  Then,  we  initialized 
the  conditional  probabilities  on  these  literals  so  that 
they  would  be  selected  in  either  a  backward-chaining 
or  forward-chaining  fashion,  depending  on  the  role  of 
the  operator  in  the  domain. 

Table  1.  A  Series  of  Preconditions  for  Left-End- 
Count. 


SUM  strategy  (a): 

Raising(Lefthand) 

Count ing( Lei thand) 
A2signed(Leithand,=Vaiue) 
Counter- value (=Value) 


SUM  strategy  (b): 

Raising (Lei thand) 

Count ing( Lei thand) 

Assigned(Lef thand, =Value) 
Counter- value (=Value ) 
Raised-lingexa(Lelthand,=Value) 


SHORTCUT  SUM  strategy  (c): 
Raising(Lelthand) 

Count ing ( Le 1 thand) 
Assigned(Lelthand,=Value) 

Rais  ed-1 ingers (Lelthand , = Value ) 


FIRST  strategy  (d): 
Raising(Lelthand) 
Countingf Lelthand) 
Asaigned(Lelthand,=Value) 


To  be  more  precise,  some  operators  were  initialized  so 
that  the  literals  representing  goals  were  all  highly  suf¬ 
ficient  for  selection.  Thus,  they  would  be  selected  any 
time  one  of  the  system’s  current  goals  matched  an  op¬ 
erator  action.  For  the  forward-chaining  operators,  all 
of  the  literals  representing  the  operator  preconditions 
were  initialized  as  highly  necessary.  These  operators 
would  not  be  selected  unless  all  of  the  preconditions 
could  be  matched  by  the  current  state.  The  com¬ 
bination  of  forward-chaining  and  backward-chaining 
operators  allows  the  system  to  generate  more  com¬ 
plex  (and  more  psychologically  plausible)  reasoning 
behavior  than  would  be  allowed  by  a  strictly  forward¬ 
chaining  or  means-ends-analysis  system. 

STRATEGY  ACQUISITION  IN  THE 
ADDITION  DOMAIN 

This  section  presents  Gifs’  behavior  through  a  series 
of  different  strategies  for  adding  numbers.  These  strat¬ 
egy  shifts  arise  from  the  learning  algorithm  incorpo¬ 
rated  into  the  system,  and  they  correspond  to  actual 
strategies  that  children  acquire  when  learning  the  task 
(Siegler  k  Jenkins,  1989). 


THE  SUM  STRATEGY 

GtPS’  initial  strategy  for  addition  corresponds  to  the 
SUM  strategy  found  in  children.  In  this  strategy,  the 
problem  solver  attempts  to  add  two  numbers  by  first 
setting  up  the  proper  number  of  fingers  on  each  hand 
(representing  the  addends)  and  then  counting  up  the 
fingers.  The  first  thing  the  system  does  is  assign  an 
addend  to  each  hand.  For  example,  for  the  problem 
of  3  4-  2,  the  system  might  assign  the  number  2  to  the 
left  hand  (the  first  hand)  and  the  number  3  to  the 
right  hand.  However,  in  this  strategy  the  order  of  the 
addends  does  not  make  a  difference,  so  it  could  just  as 
easily  have  switched  them. 

In  continuing  the  problem,  the  system  uses  a  single 
counter  together  with  its  hands  to  generate  an  an¬ 
swer.  The  system  raises  its  fingers  and  counts  them 
one  at  a  time  until  the  counter  value  is  equal  to  the 
value  of  the  appropriate  addend.  This  indicates  that 
an  End-Count  operator  should  execute.  We  feel  that 
the  counter  plays  this  role  because,  after  representing 
one  addend,  children  reset  their  count  to  zero  in  order 
to  represent  the  second.  If  the  counter  were  not  being 
used  to  stop  the  count,  it  would  not  have  to  be  reset 
between  hands. 

As  the  system  solves  new  addition  problems,  it  updates 
the  execution  concepts  for  the  End-Count  operators. 
It  soon  notices  a  number  of  relations  that  are  always 
true  when  these  operators  execute.  The  most  impor¬ 
tant  of  these  is  that  the  number  of  raised  fingers  is 
equal  to  the  counter  value.  This  and  other  relations 
get  added  into  the  preconditions  for  the  END-COUNT 
operators  (see  Table  1(b)).  This  action  alone  does  not 
change  the  system’s  outward  behavior,  but  it  proves 
important  for  subsequent  strategies. 

THE  SHORTCUT  SUM  STRATEGY 

After  some  time,  the  new  literals  in  the  system’s  ex¬ 
ecution  concepts  for  Left-End-Count  and  Right- 
End-Count  become  so  strong  that  it  attempts  to  ex¬ 
ecute  the  operators  earlier  than  usual.  At  this  point, 
GlPS  thinks  that  the  operator  should  execute  when 
the  number  of  fingers  raised  on  a  hand  is  equal  to  the 
goal  value,  even  though  the  system  has  not  yet  incre¬ 
mented  its  count  for  the  last  finger.  It  turns  out  that 
the  system  can  successfully  solve  the  addition  prob¬ 
lem  even  if  it  executes  this  operator  prematurely,  so 
it  deletes  the  condition  that  the  current  counter  value 
must  be  equal  to  the  goal  value  in  the  preconditions 
of  the  End-Count  operators  (see  Table  1(c)). 

This  change  has  a  direct  effect  on  GlPS’  behavior.  It 
continues  to  increment  its  counter  every  time  it  raises 
a  finger,  but  it  no  longer  resets  the  counter  when  it 
is  done  representing  an  addend.  This  is  because  the 
value  of  the  counter  is  no  longer  used  as  the  termina¬ 
tion  criterion  to  stop  counting  a  hand.  Because  the 
counter  is  not  reset  between  hands,  there  is  no  need  to 
go  back  and  count  up  all  the  fingers  on  both  hands  af¬ 
ter  the  addends  have  been  represented.  This  behavior 


corresponds  to  the  SHORTCUT  SUM  strategy  used 
by  children. 

THE  “FIRST”  STRATEGY 

The  next  strategy  shift  occurs  similarly.  As  GlPS  at¬ 
tempts  to  execute  the  End-Count  operators  at  var¬ 
ious  times  with  feedback  from  the  user,  it  develops  a 
“good”  concept  for  when  the  End-Count  operators 
are  executable.  One  important  part  of  this  concept  is 
that  the  goal  value  for  counting  fingers  on  a  hand  is 
always  equal  to  one  of  the  addends  when  Left-End- 
Count  executes. 

Eventually,  the  system  attempts  to  fire  the  Left-End- 
Count  operator  without  having  raised  any  fingers  at 
all.  When  it  succeeds,  it  deletes  the  precondition  that 
the  number  of  fingers  raised  on  the  hand  be  equal 
to  the  goal  value  (see  Table  1(d)).  The  system  has 
learned  that  it  can  simply  start  counting  from  the 
goal  value  for  the  left  hand  rather  than  starting  from 
zero.  Note  that  there  is  no  way  that  the  system  could 
have  jumped  to  this  strategy  from  the  initial  strat¬ 
egy.  This  indicates  that  a  noise-tolerant,  reinforcement 
approach  is  appropriate  to  account  for  this  series  of 
strategies.  GlPS  also  attempts  to  execute  the  RlGHT- 
End-Count  operator  early,  but  this  leads  to  failure. 
Thus,  the  system  begins  to  exhibit  the  FIRST  strat¬ 
egy,  in  which  the  first  number  (or  lefthand  number) 
is  simply  announced  and  used  to  continue  counting 
the  second  number  as  it  did  in  the  SHORTCUT  SUM 
strategy. 

THE  MIN  STRATEGY 

The  final  strategy  that  GlPS  generates  is  the  MIN 
strategy.  MIN  is  similar  to  the  FIRST  strategy,  ex¬ 
cept  that  the  system  learns  that  it  should  not  assign 
the  addends  arbitrarily  to  its  hands.  Rather,  it  starts 
with  the  larger  addend,  and  continues  counting  with 
the  smaller,  resulting  in  less  work.  In  GlPS,  the  knowl¬ 
edge  required  to  generate  this  strategy  can  be  learned 
during  the  SHORTCUT  SUM  or  FIRST  strategies.  In 
both  of  these  strategies,  when  the  problem  solver  is 
representing  an  addend  on  the  right  (or  second)  hand, 
the  counter  value  is  not  equal  to  the  number  of  fingers 
that  are  raised. 

We  hypothesize  that  a  student  may  sometimes  get 
mixed  up  or  lose  count  and  fail  to  solve  the  problem  be¬ 
cause  he  “loses  his  place"  when  representing  the  right- 
hand  addend.  This  type  of  interference  would  have 
more  chance  of  occurring  for  larger  numbers.  Thus, 
the  solver  would  learn  to  prefer  counting  the  smaller 
number  on  the  right  hand,  because  it  leads  to  fewer 
failures  of  this  type.  This  type  of  failure  would  not  oc¬ 
cur  with  the  left  hand,  because  the  number  of  raised 
fingers  in  the  SHORTCUT  SUM  strategy  is  always 
equal  to  the  value  of  the  counter  for  that  hand. 

We  simulated  this  hypothesis  in  GlPS  by  causing  it 
to  fail  sometimes  during  the  SHORTCUT  SUM  strat¬ 
egy  when  it  decided  to  count  the  larger  addend  on  its 


tight  hand.  This  caused  the  system  to  update  the  se¬ 
lection  concept  for  the  operator  that  assigns  numbers 
to  hands,  so  that  it  would  prefer  to  assign  the  smaller 
addend  to  the  right  hand.  With  this  experience,  the 
system  developed  the  MIN  strategy  rather  than  the 
FIRST  strategy. 

DISCUSSION 

We  have  introduced  a  new  learning  problem-solving 
system,  called  Gips.  Although  most  systems  in  this 
area  use  an  “explanation-based”  approach,  GlPS  in¬ 
corporates  a  probabilistic,  reinforcement  learning  al¬ 
gorithm.  This  learning  algorithm  has  been  demon¬ 
strated  to  account  for  some  low-level  learning  behav¬ 
iors  (Rescorla,  1968;  Schlimmer,  1986),  and  our  suc¬ 
cess  with  Gips  indicates  that  it  is  suitable  to  model 
learning  at  higher  levels  required  for  problem  solving. 

GlPS  uses  this  probabilistic  approach  to  learn  selec¬ 
tion  and  execution  concepts  for  its  operators.  Sub¬ 
tle  changes  in  these  concepts  result  in  sometimes  dra¬ 
matic  shifts  in  strategy  while  solving  problems.  In 
the  domain  of  addition,  we  have  demonstrated  that 
the  strategies  acquired  by  GlPS  match  strategies  ac¬ 
quired  by  children.  This  argues  that  the  learning  and 
reasoning  mechanisms  incorporated  into  Gips  corre¬ 
spond  (at  some  level)  to  cognitive  mechanisms  in  the 
children.  One  other  computational  system  models  the 
SUM-to-MIN  transition  (Neches,  1987),  but  GlPS  ap¬ 
pears  to  provide  a  much  better  account  of  the  psycho¬ 
logical  data  found  by  Siegler  and  Jenkins  (1989). 

This  study  concentrated  on  providing  a  qualitative  ac¬ 
count  for  a  set  of  general  psychological  behaviors.  Our 
next  step  is  to  use  GlPS  to  model  human  behavior  at 
a  smaller  grain  size.  Even  within  general  strategies, 
there  is  a  large  amount  of  room  for  individual  differ¬ 
ences  among  subjects,  and  we  feel  that  the  mechanisms 
in  GlPS  are  ideal  for  capturing  these  sometimes  subtle 
distinctions. 
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